In this tutorial, you will learn how to use superheat, and will discover a portion of the immense variety of customization options.
The superheat vignette can be found at https://rlbarter.github.io/superheat/. This incredibly detailed vignette was created using bookdown and is hosted on github pages and it contains (almost) everything you might want to know about using superheat!
First, let’s load the packages we will be using throughout the tutorial.
# load packages
library(tidyverse)
library(devtools)
library(knitr)
library(forcats)
Next, we will install the superheat package from GitHub (using the install_github() function from the devtools package). Note that superheat is also on CRAN, but this is an older version; the most recent version will always be the GitHub development version (but it might have a few minor bugs here and there!).
# install superheat package from github
install_github("rlbarter/superheat")
# install superheat
library(superheat)
We will be using data on organ donations from 2006 to 2014 is found in the organ.csv file in the data folder. Let’s load it in:
# load in the organ donation data
organs <- read.csv("data/organs.csv")
colnames(organs) <- c("country", 2006:2014)
# view first 6 rows of the organ donation data
kable(head(organs), digits = 1)
| country | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
|---|---|---|---|---|---|---|---|---|---|
| Argentina | 11.8 | 12.3 | 13.0 | 12.4 | 14.3 | 14.8 | 15.3 | 13.7 | 13.3 |
| Australia | NA | 9.6 | 12.3 | 11.6 | 14.0 | 14.9 | 15.5 | 16.8 | 16.0 |
| Austria | 25.2 | 22.6 | 20.5 | 25.5 | 23.3 | 24.4 | 23.6 | 24.5 | 24.9 |
| Belgium | 27.1 | 28.4 | 26.1 | 26.9 | 20.7 | 30.6 | 30.2 | 29.2 | 26.9 |
| Brazil | 5.9 | 5.5 | 6.9 | 8.0 | 9.9 | 11.2 | 12.4 | 12.7 | 13.4 |
| Bulgaria | 2.5 | 1.2 | 1.1 | 1.5 | 2.7 | 0.5 | 0.3 | 2.9 | 5.3 |
To make a heatmap in ggplot2, you need to convert your dataframe to long-form.
# convert organs to longform using tidyr
organs_long <- organs %>%
gather(key = "year", value = "donors", -country)
# look at the first 6 rows of the long-form dataset
kable(head(organs_long), digits = 1)
| country | year | donors |
|---|---|---|
| Argentina | 2006 | 11.8 |
| Australia | 2006 | NA |
| Austria | 2006 | 25.2 |
| Belgium | 2006 | 27.1 |
| Brazil | 2006 | 5.9 |
| Bulgaria | 2006 | 2.5 |
# look at the last 6 rows of the long-form dataset
kable(tail(organs_long), digits = 1)
| country | year | donors | |
|---|---|---|---|
| 517 | Tunisia | 2014 | 0.8 |
| 518 | Turkey | 2014 | 5.4 |
| 519 | United Kingdom | 2014 | 20.6 |
| 520 | United States of America | 2014 | 26.6 |
| 521 | Uruguay | 2014 | 20.0 |
| 522 | Venezuela (Bolivarian Republic of) | 2014 | 1.7 |
I could use ggplot to create a heatmap of the organ donations by country.
ggplot(organs_long) +
geom_raster(aes(x = year, y = country, fill = donors)) +
scale_fill_viridis_c()
Clearly some row ordering is needed! For example, perhaps we want to order the rows in decreasing order. However, while it is easy to rearrange the rows of a matrix when it is stored in its original form, manipulating a matrix when it is recorded in long-form can be surprisingly difficult.
One way to do it is as follows:
organs_long <- organs_long %>%
# identify the average number of donations per country
group_by(country) %>%
mutate(avg_donors = mean(donors, na.rm = T)) %>%
ungroup() %>%
arrange(avg_donors) %>%
# remove the avg_donors column
select(-avg_donors) %>%
# reorder the country factor levels
mutate(country = fct_inorder(country))
While this does the job, it is somewhat convoluted, especially for someone who is not a highly experienced R user! The re-ordered plot is shown below:
ggplot(organs_long) +
geom_raster(aes(x = year, y = country, fill = donors)) +
scale_fill_viridis_c()
When using superheat, there is no need to convert the original matrix to a longform.
# convert the data to a numeric-only data frame
organs_matrix <- organs
# replace the rownames with the country names
rownames(organs_matrix) <- organs$country
organs_matrix <- organs_matrix %>% select(-country)
superheat(organs_matrix)
Rearranging the rows is much simpler when dealing with a wideform matrix rather than a longform data frame.
# identify the order of the rows (increasing order of average donations)
row_order <- apply(organs_matrix, 1, mean, na.rm = T) %>% order
# reorder the rows of the matrix
organs_matrix <- organs_matrix[row_order, ]
However, with superheat, not even this is necessary. You can simply provide a row order argument to the function.
superheat(organs_matrix,
title = "Number of organs donated by deceased donors\nper 100,000 individuals",
# arrange the rows in order of increasing average number of donors
order.rows = order(apply(organs_matrix, 1, mean, na.rm = T)))
Much of the power of the superheat package comes from the ability to add additional information to the heatmap.
You can add additional variables to the plot via adjacent scatter, line, bar, or boxplots. For adjacent plots above the heatmap, the x-axis corresponds to the column variables. Correspondingly, for ajacent plots to the right of the heatmap, the y-axis corresponds to the row variables.
In the superheatmap below, we add a line plot above the columns which corresponds to the total number of organs (per 100,000) donated over time (summing over the countries).
# adding a trendline above the heatmap
superheat(organs_matrix,
title = "Number of organs donated by deceased donors\nper 100,000 individuals",
# arrange the rows in order of increasing average number of donors
order.rows = order(apply(organs_matrix, 1, mean, na.rm = T)),
# add a plot of total organs donated accross time
yt = apply(organs_matrix, 2, sum, na.rm = T),
yt.axis.name = "Total organs\nfrom deceased donors",
yt.plot.type = "line",
yt.plot.size = 0.25,
yt.axis.name.size = 12)
Next, we can also add external information, such as the human development index (HDI) ranking for each country, as a barplot to the right of the rows.
hdi <- read.csv("data/hdi_2014.csv")
kable(head(hdi))
| X | country | year | rank | hdi |
|---|---|---|---|---|
| 1 | Argentina | 2014 | 40 | 0.836 |
| 2 | Australia | 2014 | 2 | 0.935 |
| 3 | Austria | 2014 | 23 | 0.885 |
| 4 | Belgium | 2014 | 21 | 0.890 |
| 5 | Brazil | 2014 | 75 | 0.755 |
| 6 | Bulgaria | 2014 | 59 | 0.782 |
Note that the order.rows argument will apply the same ordering to yr as to the rows of the matrix.
# add hdi as a barplot to the rows
superheat(organs_matrix,
title = "Number of organs donated by deceased donors\nper 100,000 individuals",
# arrange the rows in order of increasing average number of donors
order.rows = order(apply(organs_matrix, 1, mean, na.rm = T)),
# add a plot of total organs donated accross time
yt = apply(organs_matrix, 2, sum, na.rm = T),
yt.axis.name = "Total organs\nfrom deceased donors",
yt.plot.type = "line",
yt.plot.size = 0.25,
yt.axis.name.size = 12,
# add a hdi barplot
yr = hdi$rank,
yr.plot.type = "bar",
yr.axis.name = "HDI ranking",
yr.axis.name.size = 12)
Having added a bunch of information to our superheatmap, we are now ready to perfect our superheatmap. For instance, we can change the color of each
# doing a bunch of stuff to make the plot prettier
superheat(organs_matrix,
title = "Number of organs donated by deceased donors\nper 100,000 individuals",
# arrange the rows in order of increasing average number of donors
order.rows = order(apply(organs_matrix, 1, mean, na.rm = T)),
# add a plot of total organs donated accross time
yt = apply(organs_matrix, 2, sum, na.rm = T),
yt.axis.name = "Total organs\nfrom deceased donors",
yt.plot.type = "line",
yt.plot.size = 0.25,
yt.axis.name.size = 14,
# add a hdi barplot
yr = hdi$rank,
yr.plot.type = "bar",
yr.axis.name = "HDI ranking",
yr.axis.name.size = 14,
yr.obs.col = rep("grey80", nrow(organs_matrix)),
# bottom labels
bottom.label.size = 0.05,
bottom.label.col = "white",
bottom.label.text.angle = 90,
bottom.label.text.alignment = "right",
# left labels
left.label.col = "white",
left.label.text.alignment = "right",
# grid lines
grid.vline.col = "white",
grid.vline.size = 2,
force.grid.hline = TRUE,
grid.hline.col = "white",
grid.hline.size = 0.5)
As a bonus, if you so desired, you could add the raw counts to the heatmap.
# remove NAs for matrix to plot ontop of heatmap
organs_text <- round(organs_matrix, 1)
organs_text[is.na(organs_text)] <- 0
organs_text <- as.matrix(organs_text)
# plot the matrix on top of the heatmap
superheat(organs_matrix,
title = "Number of organs donated by deceased donors\nper 100,000 individuals",
X.text = organs_text)
You could also change the color of the text so that the text for darker cells is lighter.
# set text color
organs_text_color <- organs_text
organs_text_color[organs_text < 9] <- "grey80"
organs_text_color[organs_text >= 9] <- "black"
# plot the matrix on top of the heatmap
superheat(organs_matrix,
title = "Number of organs donated by deceased donors\nper 100,000 individuals",
X.text = organs_text,
X.text.col = organs_text_color)